30 research outputs found
Finite Dimensional Infinite Constellations
In the setting of a Gaussian channel without power constraints, proposed by
Poltyrev, the codewords are points in an n-dimensional Euclidean space (an
infinite constellation) and the tradeoff between their density and the error
probability is considered. The capacity in this setting is the highest
achievable normalized log density (NLD) with vanishing error probability. This
capacity as well as error exponent bounds for this setting are known. In this
work we consider the optimal performance achievable in the fixed blocklength
(dimension) regime. We provide two new achievability bounds, and extend the
validity of the sphere bound to finite dimensional infinite constellations. We
also provide asymptotic analysis of the bounds: When the NLD is fixed, we
provide asymptotic expansions for the bounds that are significantly tighter
than the previously known error exponent results. When the error probability is
fixed, we show that as n grows, the gap to capacity is inversely proportional
(up to the first order) to the square-root of n where the proportion constant
is given by the inverse Q-function of the allowed error probability, times the
square root of 1/2. In an analogy to similar result in channel coding, the
dispersion of infinite constellations is 1/2nat^2 per channel use. All our
achievability results use lattices and therefore hold for the maximal error
probability as well. Connections to the error exponent of the power constrained
Gaussian channel and to the volume-to-noise ratio as a figure of merit are
discussed. In addition, we demonstrate the tightness of the results numerically
and compare to state-of-the-art coding schemes.Comment: 54 pages, 13 figures. Submitted to IEEE Transactions on Information
Theor
Bridging Dense and Sparse Maximum Inner Product Search
Maximum inner product search (MIPS) over dense and sparse vectors have
progressed independently in a bifurcated literature for decades; the latter is
better known as top- retrieval in Information Retrieval. This duality exists
because sparse and dense vectors serve different end goals. That is despite the
fact that they are manifestations of the same mathematical problem. In this
work, we ask if algorithms for dense vectors could be applied effectively to
sparse vectors, particularly those that violate the assumptions underlying
top- retrieval methods. We study IVF-based retrieval where vectors are
partitioned into clusters and only a fraction of clusters are searched during
retrieval. We conduct a comprehensive analysis of dimensionality reduction for
sparse vectors, and examine standard and spherical KMeans for partitioning. Our
experiments demonstrate that IVF serves as an efficient solution for sparse
MIPS. As byproducts, we identify two research opportunities and demonstrate
their potential. First, we cast the IVF paradigm as a dynamic pruning technique
and turn that insight into a novel organization of the inverted index for
approximate MIPS for general sparse vectors. Second, we offer a unified regime
for MIPS over vectors that have dense and sparse subspaces, and show its
robustness to query distributions
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Maximum Inner Product Search or top-k retrieval on sparse vectors is
well-understood in information retrieval, with a number of mature algorithms
that solve it exactly. However, all existing algorithms are tailored to text
and frequency-based similarity measures. To achieve optimal memory footprint
and query latency, they rely on the near stationarity of documents and on laws
governing natural languages. We consider, instead, a setup in which collections
are streaming -- necessitating dynamic indexing -- and where indexing and
retrieval must work with arbitrarily distributed real-valued vectors. As we
show, existing algorithms are no longer competitive in this setup, even against
naive solutions. We investigate this gap and present a novel approximate
solution, called Sinnamon, that can efficiently retrieve the top-k results for
sparse real valued vectors drawn from arbitrary distributions. Notably,
Sinnamon offers levers to trade-off memory consumption, latency, and accuracy,
making the algorithm suitable for constrained applications and systems. We give
theoretical results on the error introduced by the approximate nature of the
algorithm, and present an empirical evaluation of its performance on two
hardware platforms and synthetic and real-valued datasets. We conclude by
laying out concrete directions for future research on this general top-k
retrieval problem over sparse vectors
Primary hyperoxaluria: from gene defects to designer drugs?
A new variant of bit interleaved coded modulation (BICM) is proposed. In the
new scheme, called Parallel BICM, L identical binary codes are used in parallel
using a mapper, a newly proposed finite-length interleaver and a binary dither
signal. As opposed to previous approaches, the scheme does not rely on any
assumptions of an ideal, infinite-length interleaver. Over a memoryless
channel, the new scheme is proven to be equivalent to a binary memoryless
channel. Therefore the scheme enables one to easily design coded modulation
schemes using a simple binary code that was designed for that binary channel.
The overall performance of the coded modulation scheme is analytically
evaluated based on the performance of the binary code over the binary channel.
The new scheme is analyzed from an information theoretic viewpoint, where the
capacity, error exponent and channel dispersion are considered. The capacity of
the scheme is identical to the BICM capacity. The error exponent of the scheme
is numerically compared to a recently proposed mismatched-decoding exponent
analysis of BICM.Comment: 19 pages, 15 figures. A shorter version will be presented at the 48th
Allerton Conference on Communication, Control, and Computing (Allerton 2010